Clustering Streaming Time Series Using CBC
نویسندگان
چکیده
Clustering streaming time series is a difficult problem. Most traditional algorithms are too inefficient for large amounts of data and outliers in them. In this paper, we propose a new clustering method, which clusters Biclipped (CBC) stream data. It contains three phrases, namely, dimensionality reduction through piecewise aggregate approximation (PAA), Bi-clipped process that clipped the real valued series through bisecting the value field, and clustering. Through related experiments, we find that CBC gains higher quality solutions in less time compared with M-clipped method that clipped the real value series through the mean of them, and unclipped methods. This situation is especially distinct when streaming time series contain outliers.
منابع مشابه
Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملA MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams
In streaming time series the Clustering problem is more complex, since the dynamic nature of streaming data makes previous clustering methods inappropriate. In this paper, we propose firstly a new method to evaluate Clustering in streaming time series databases. First, we introduce a novel multiresolution PAA (MPAA) transform to achieve our iterative clustering algorithm. The method is based on...
متن کاملClustering Techniques for Streaming Time Series Data
Clustering streaming time series also referred to as a multivariate streaming time series clustering has been researcher’s target for a long time. Yet so far there are many issues banning them to reach a nearly optimal solution. In this paper we investigate these works pointing to some of the issues focusing on the parameters and give some points to be accounted for future resaerches.
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007